CCLasso: correlation inference for compositional data through Lasso

نویسندگان

  • Huaying Fang
  • Chengcheng Huang
  • Hongyu Zhao
  • Minghua Deng
چکیده

MOTIVATION Direct analysis of microbial communities in the environment and human body has become more convenient and reliable owing to the advancements of high-throughput sequencing techniques for 16S rRNA gene profiling. Inferring the correlation relationship among members of microbial communities is of fundamental importance for genomic survey study. Traditional Pearson correlation analysis treating the observed data as absolute abundances of the microbes may lead to spurious results because the data only represent relative abundances. Special care and appropriate methods are required prior to correlation analysis for these compositional data. RESULTS In this article, we first discuss the correlation definition of latent variables for compositional data. We then propose a novel method called CCLasso based on least squares with [Formula: see text] penalty to infer the correlation network for latent variables of compositional data from metagenomic data. An effective alternating direction algorithm from augmented Lagrangian method is used to solve the optimization problem. The simulation results show that CCLasso outperforms existing methods, e.g. SparCC, in edge recovery for compositional data. It also compares well with SparCC in estimating correlation network of microbe species from the Human Microbiome Project. AVAILABILITY AND IMPLEMENTATION CCLasso is open source and freely available from https://github.com/huayingfang/CCLasso under GNU LGPL v3. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bayesian method for detecting pairwise associations in compositional data

Compositional data consist of vectors of proportions normalized to a constant sum from a basis of unobserved counts. The sum constraint makes inference on correlations between unconstrained features challenging due to the information loss from normalization. However, such correlations are of long-standing interest in fields including ecology. We propose a novel Bayesian framework (BAnOCC: Bayes...

متن کامل

MPLasso: Inferring microbial association networks using prior microbial knowledge

Due to the recent advances in high-throughput sequencing technologies, it becomes possible to directly analyze microbial communities in human body and environment. To understand how microbial communities adapt, develop, and interact with the human body and the surrounding environment, one of the fundamental challenges is to infer the interactions among different microbes. However, due to the co...

متن کامل

Bayesian Quantile Regression with Adaptive Lasso Penalty for Dynamic Panel Data

‎Dynamic panel data models include the important part of medicine‎, ‎social and economic studies‎. ‎Existence of the lagged dependent variable as an explanatory variable is a sensible trait of these models‎. ‎The estimation problem of these models arises from the correlation between the lagged depended variable and the current disturbance‎. ‎Recently‎, ‎quantile regression to analyze dynamic pa...

متن کامل

Penalized Lasso Methods in Health Data: application to trauma and influenza data of Kerman

Background: Two main issues that challenge model building are number of Events Per Variable and multicollinearity among exploratory variables. Our aim is to review statistical methods that tackle these issues with emphasize on penalized Lasso regression model.  The present study aimed to explain problems of traditional regressions due to small sample size and m...

متن کامل

Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements.

Combining multiple studies is frequently undertaken in biomedical research to increase sample sizes for statistical power improvement. We consider the marginal model for the regression analysis of repeated measurements collected in several similar studies with potentially different variances and correlation structures. It is of great importance to examine whether there exist common parameters a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 31 19  شماره 

صفحات  -

تاریخ انتشار 2015